Memory Augmented Neural Networks with Wormhole Connections

نویسندگان

  • Çaglar Gülçehre
  • A. P. Sarath Chandar
  • Yoshua Bengio
چکیده

Recent empirical results on long-term dependency tasks have shown that neural networks augmented with an external memory can learn the long-term dependency tasks more easily and achieve better generalization than vanilla recurrent neural networks (RNN). We suggest that memory augmented neural networks can reduce the effects of vanishing gradients by creating shortcut (or wormhole) connections. Based on this observation, we propose a novel memory augmented neural network model called TARDIS (Temporal Automatic Relation Discovery in Sequences). The controller of TARDIS can store a selective set of embeddings of its own previous hidden states into an external memory and revisit them as and when needed. For TARDIS, memory acts as a storage for wormhole connections to the past to propagate the gradients more effectively and it helps to learn the temporal dependencies. The memory structure of TARDIS has similarities to both Neural Turing Machines (NTM) and Dynamic Neural Turing Machines (D-NTM), but both read and write operations of TARDIS are simpler and more efficient. We use discrete addressing for read/write operations which helps to substantially to reduce the vanishing gradient problem with very long sequences. Read and write operations in TARDIS are tied with a heuristic once the memory becomes full, and this makes the learning problem simpler when compared to NTM or D-NTM type of architectures. We provide a detailed analysis on the gradient propagation in general for MANNs. We evaluate our models on different long-term dependency tasks and report competitive results in all of them.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Role of STDP in regulation of neural timing networks in human: a simulation study

Many physiological events require an accurate timing signal, usually generated by neural networks called central pattern generators (CPGs). On the other hand, properties of neurons and neural networks (e.g. time constants of neurons and weights of network connections) alter with time, resulting in gradual changes in timing of such networks. Recently, a synaptic weight adjustment mechanism has b...

متن کامل

The Supercomputer Supernet (SSN): A High-Speed Electro-Optic Campus and Metropolitan Network*

The Supercomputer Supernel (SSN} is a highperformance, scalable optical interconnection nelwork I or supercompulers and worksaiion cMsers based on asynchronous, wormhole-rouing switches. The WDM oplical backbone extends the geographic coverage range from inerdepartmenal io campus and even o mefropol2an areas w2h dynamically reconfigurable direct or muUi-hop connections. The neiwork provides ver...

متن کامل

Role of STDP in regulation of neural timing networks in human: a simulation study

Many physiological events require an accurate timing signal, usually generated by neural networks called central pattern generators (CPGs). On the other hand, properties of neurons and neural networks (e.g. time constants of neurons and weights of network connections) alter with time, resulting in gradual changes in timing of such networks. Recently, a synaptic weight adjustment mechanism has b...

متن کامل

Visual Question Answering with Memory-Augmented Networks

In this paper, we exploit memory-augmented neural networks to predict accurate answers to visual questions, even when those answers rarely occur in the training set. The memory network incorporates both internal and external memory blocks and selectively pays attention to each training exemplar. We show that memory-augmented neural networks are able to maintain a relatively long-term memory of ...

متن کامل

Dynamic Sliding Mode Control of Nonlinear Systems Using Neural Networks

Dynamic sliding mode control (DSMC) of nonlinear systems using neural networks is proposed. In DSMC the chattering is removed due to the integrator which is placed before the input control signal of the plant. However, in DSMC the augmented system is one dimension bigger than the actual system i.e. the states number of augmented system is more than the actual system and then to control of such ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1701.08718  شماره 

صفحات  -

تاریخ انتشار 2017